06. 清理列标签

清理列标签

1. 删除无关列

删除不一致(不同时存在于两个数据集中)的特征或与我们的问题无关的列。使用 Pandas 的 drop 函数。

要删除的列:
  • 2008 数据集中: 'Stnd'、'Underhood ID'、'FE Calc Appr'、'Unadj Cmb MPG'
  • 2018 数据集中: 'Stnd'、'Stnd Description'、'Underhood ID'、'Comb CO2'

2. 重命名列

  • 将 2008 数据集中的"Sales Area""列标签改为"Cert Region"以确保一致性。
  • 重命名所有列标签以空格替换为下划线,并将所有内容转换为小写。(在 Python 中,下划线比空格更好用。例如,空格不允许你使用 df.column_name 代替 df['column_name'] 来选择列,或使用 query() 。保持小写和下划线一致也使列名更好记。)

Workspace

This section contains either a workspace (it can be a Jupyter Notebook workspace or an online code editor work space, etc.) and it cannot be automatically downloaded to be generated here. Please access the classroom with your account and manually download the workspace to your local machine. Note that for some courses, Udacity upload the workspace files onto https://github.com/udacity , so you may be able to download them there.

Workspace Information:

  • Default file path:
  • Workspace type: jupyter
  • Opened files (when workspace is loaded): n/a